Disaster Survival Guide in Petascale Computing: An Algorithmic Approach
نویسندگان
چکیده
1 Disaster Survival Guide in Petascale Computing: An Algorithmic Approach 3 Jack J. Dongarra, Zizhong Chen, George Bosilca, and Julien Langou 1.1 FT-MPI: A fault tolerant MPI implementation . . . . . . . . 6 1.1.1 FT-MPI Overview . . . . . . . . . . . . . . . . . . . . 6 1.1.2 FT-MPI: A Fault Tolerant MPI Implementation . . . 6 1.1.3 FT-MPI Usage . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Application Level Diskless Checkpointing . . . . . . . . . . . 8 1.2.1 Neighbor-Based Checkpointing . . . . . . . . . . . . . 10 1.2.2 Checksum-Based Checkpointing . . . . . . . . . . . . . 11 1.2.3 Weighted-Checksum-Based Checkpointing . . . . . . . 13 1.3 A Fault Survivable Iterative Equation Solver . . . . . . . . . 17 1.3.1 Preconditioned Conjugate Gradient Algorithm . . . . 17 1.3.2 Incorporating Fault Tolerance into PCG . . . . . . . . 18 1.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . 21 1.4.1 Performance of PCG with Different MPI Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.2 Performance Overhead of Taking Checkpoint . . . . . 22 1.4.3 Performance Overhead of Performing Recovery . . . . 24 1.4.4 Numerical Impact of Round-Off Errors in Recovery . . 26 1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 28
منابع مشابه
DAG-Based Software Frameworks for PDEs
The task-based approach to software and parallelism is well-known and has been proposed as a potential candidate, named the silver model, for exascale software. This approach is not yet widely used in the large-scale multi-core parallel computing of complex systems of partial differential equations. After surveying task-based approaches we investigate how well the Uintah software and an extensi...
متن کاملPetascale algorithms for reactor hydrodynamics
We describe recent algorithmic developments that have enabled large eddy simulations of reactor flows on up to P = 65, 000 processors on the IBM BG/P at the Argonne Leadership Computing Facility.
متن کاملPetascale Computing for Future Breakthroughs in Global Seismology
Will the advent of “petascale” computers be relevant to research in global seismic tomography? We illustrate here in detail two possible consequences of the expected leap in computing capability. First, being able to identify larger sets of differently regularized/parameterized solutions in shorter times will allow to evaluate their relative quality by more accurate statistical criteria than in...
متن کاملAbstractions and Middleware for Petascale Computing and Beyond
As high-performance computing moves to the petascale and beyond, a number of algorithmic and software challenges need to be addressed. This paper reviews the main performance-limiting factors in today’s high-performance computing software and outlines a possible new programming paradigm to address them. The proposed paradigm is based on abstract parallel data structures and operations that enca...
متن کاملA robust optimization model for distribution and evacuation in the disaster response phase
Natural disasters, such as earthquakes, affect thousands of people and can cause enormous financial loss. Therefore, an efficient response immediately following a natural disaster is vital to minimize the aforementioned negative effects. This research paper presents a network design model for humanitarian logistics which will assist in location and allocation decisions for multiple disaster per...
متن کامل